In this assignment, you will perform basic analysis for the San Francisco Housing Market to allow potential real estate investors to choose rental investment properties.
# initial imports
import os
import pandas as pd
import matplotlib.pyplot as plt
import hvplot.pandas
import plotly.express as px
from pathlib import Path
from dotenv import load_dotenv
%matplotlib inline
# Add additional imports
import numpy as np
import panel as pn
from panel.interact import interact
from panel import widgets
# Read the Mapbox API key
load_dotenv()
# mapbox_token = os.getenv("MAPBOX")
mapbox_token = os.getenv("MAPBOX_API_KEY")
# the .env file with key will be removed from local folder before uploading to Github Repo
# Read the census data into a Pandas DataFrame
file_path = Path("Data/sfo_neighborhoods_census_data.csv")
sfo_data = pd.read_csv(file_path, index_col="year")
sfo_data.head()
In this section, you will calculate the number of housing units per year and visualize the results as a bar chart using the Pandas plot function.
Hint: Use the Pandas groupby function
Optional challenge: Use the min, max, and std to scale the y limits of the chart.
# Calculate the mean number of housing units per year (hint: use groupby)
# YOUR CODE HERE!
# First group by year, then drop columns that are not requested to show
sfo_data_grp1 = sfo_data.groupby('year').mean()
sfo_data_grp2 = sfo_data_grp1.drop(columns=['sale_price_sqr_foot','gross_rent'])
sfo_data_grp2.head
# Use the Pandas plot function to plot the average housing units per year.
# Note: You will need to manually adjust the y limit of the chart using the min and max values from above.
# YOUR CODE HERE!
sfo_data_grp2.hvplot.bar(label='Housing Units per year', x='year',y='housing_units', xlabel='Year', ylabel='Housing Units')
# Optional Challenge: Use the min, max, and std to scale the y limits of the chart
# YOUR CODE HERE!
# plt.show()
# plt.close(fig_housing_units)
In this section, you will calculate the average gross rent and average sales price for each year. Plot the results as a line chart.
# Calculate the average gross rent and average sale price per square foot
# YOUR CODE HERE!
sfo_data_avg1=sfo_data
sfo_data_avg1.head()
# First group by year, then drop columns that are not requested to show
sfo_data_grp11 = sfo_data.groupby('year').mean()
sfo_data_grp12 = sfo_data_grp1.drop(columns=['housing_units'])
sfo_data_grp12.head
# Plot the Average Gross Rent per Year as a Line Chart
# YOUR CODE HERE!
sfo_data_grp13 = sfo_data_grp12.drop(columns=['sale_price_sqr_foot'])
sfo_data_grp13.plot(title='Average Gross Rent in San Fransisco',legend=False)
# Plot the Average Sales Price per Year as a line chart
# YOUR CODE HERE!
sfo_data_grp14 = sfo_data_grp12.drop(columns=['gross_rent'])
sfo_data_grp14.plot(title='Average Sale Price per Square Foot in San Fransisco',legend=False)
In this section, you will use hvplot to create an interactive visulization of the Average Prices with a dropdown selector for the neighborhood.
Hint: It will be easier to create a new DataFrame from grouping the data and calculating the mean prices for each year and neighborhood
# Group by year and neighborhood and then create a new dataframe of the mean values
# YOUR CODE HERE!
# First group by year, then drop columns that are not requested to show
# sfo_data_grp21 = sfo_data.groupby(['year','neighborhood']).mean()
sfo_data_grp21 = sfo_data
sfo_data_grp21.reset_index(inplace=True)
sfo_data_grp21.head(10)
# Use hvplot to create an interactive line chart of the average price per sq ft.
# The plot should have a dropdown selector for the neighborhood
# YOUR CODE HERE!
# Slice data
sfo_data_grp22 = sfo_data_grp21.drop(columns=['housing_units','gross_rent'])
sfo_data_grp22.groupby(['year','neighborhood']).mean().sort_values('year')
sfo_data_grp22.head
sfo_data_grp22.hvplot('year','sale_price_sqr_foot')
In this section, you will need to calculate the mean sale price for each neighborhood and then sort the values to obtain the top 10 most expensive neighborhoods on average. Plot the results as a bar chart.
# Getting the data from the top 10 expensive neighborhoods
# YOUR CODE HERE!
# First Sort and then group-by
sfo_data_grp31 = sfo_data
# sfo_data_grp31.reset_index(inplace=True)
sfo_data_grp32 = sfo_data_grp31.drop(columns=['year'])
sfo_data_grp33 = sfo_data_grp32.groupby('neighborhood').mean().sort_values('sale_price_sqr_foot',ascending=False)
sfo_data_grp33.head(10)
# Plotting the data from the top 10 expensive neighborhoods
# YOUR CODE HERE!
# Create a DF with Top 10 Records
sfo_data_grp34 = sfo_data_grp33.iloc[0:10]
# Create a DF by dropping columns
sfo_data_grp35 = sfo_data_grp34.drop(columns=['housing_units','gross_rent'])
sfo_data_grp35.hvplot.bar(label='Top 10 Expensive Neighborhoods', x='neighborhood',y='sale_price_sqr_foot', xlabel='Neighborhood', ylabel='Sales price Sqr Foot')
In this section, you will use plotly express to create parallel coordinates and parallel categories visualizations so that investors can interactively filter and explore various factors related to the sales price of the neighborhoods.
Using the DataFrame of Average values per neighborhood (calculated above), create the following visualizations:
# Parallel Coordinates Plot
# YOUR CODE HERE!
px.parallel_coordinates(sfo_data_grp34)
# Parallel Categories Plot
# YOUR CODE HERE!
sfo_data_grp41 = sfo_data_grp34
sfo_data_grp41.reset_index(inplace=True)
px.parallel_categories(sfo_data_grp41)
In this section, you will read in neighboor location data and build an interactive map with the average prices per neighborhood. Use a scatter_mapbox from plotly express to create the visualization. Remember, you will need your mapbox api key for this.
# Load neighborhoods coordinates data
file_path = Path("Data/neighborhoods_coordinates.csv")
df_neighborhood_locations = pd.read_csv(file_path)
df_neighborhood_locations.head()
You will need to join the location data with the mean prices per neighborhood
# Calculate the mean values for each neighborhood
# YOUR CODE HERE!
# Drop column - Year, Group By Neighborhood, then reset-index
sfo_data_grp51 = sfo_data.drop(columns=['year']).groupby('neighborhood').mean()
sfo_data_grp51.reset_index(inplace=True)
sfo_data_grp51.head()
# Join the average values with the neighborhood locations
# YOUR CODE HERE!
df_neighborhood_locations_11 = df_neighborhood_locations.set_index('Neighborhood')
sfo_data_grp52 = sfo_data_grp51.set_index('neighborhood')
merged_data = pd.concat([df_neighborhood_locations_11, sfo_data_grp52],axis='columns', join = 'inner')
merged_data_11=merged_data.reset_index()
merged_data_12=merged_data_11.rename(columns={'index': 'Neighborhood'})
merged_data_12.head()
Plot the aveage values per neighborhood with a plotly express scatter_mapbox visualization.
# Create a scatter mapbox to analyze neighborhood info
# YOUR CODE HERE!
# Set the Mapbox API
px.set_mapbox_access_token(mapbox_token)
map = px.scatter_mapbox(
merged_data_12,
title='Average Sales Price per Square Foot and Gross Rent in San Francisco',
lat='Lat',
lon='Lon',
size='sale_price_sqr_foot',
color='gross_rent',
hover_name='Neighborhood',
zoom=11
)
map.show()